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TECHNICAL FIELD 
[0001] The invention pertains to video processing. 



BACKGROUND 

[0002] Video motion provides useful cues in understanding video content. 
As a result, research efforts are increasingly relying on semantic event analysis to 
obtain video structures and indices. As one of important cues for semantic event 
analysis, compact and effective motion representation is indispensable for video 
analysis, especially for sports videos. However, conventional semantic video 
analysis techniques do not adequately utilize video motion information due to its 
complexities and the lack of effective motion representations. As a result many 
video motion events go undetected and unanalyzed. 

SUMMARY 

[0003] Systems and methods for representing sequential motion patterns are 
described. In one aspect, video frames are converted into a sequence of energy 
redistribution (ER) measurements. One or more motion filters are then applied to 
the ER measurements to generate one or more temporal sequences of motion 
patterns, the number of temporal sequences being a function of the number of 
motion filters. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0004] In the figures, the left-most digit of a reference number identifies the 
particular figure in which the referenced feature first appears. 

Lee & Hayes, PLLC 1 Atty Docket No. MS1-1601US 

(509) 324-9256 



[0005] Fig. 1 is a block diagram of an exemplary computing environment 
within which systems and methods for representing sequential motion patterns 
may be implemented. 

[0006] Fig. 2 shows computer-program module and program data aspects of 
the system of Fig. 1 for representing sequential motion patterns. 

[0007] Fig. 3 shows an exemplary video set of video frames with a motion 
vector field indicating a redistribution of energy from one frame to another frame. 

[0008] Figs. 4-6 show an exemplary set of video frames, exemplary 
motion filters that have been applied to calculate motion energy redistributions 
associated with respective ones of the frames, and resulting filtering responses, or 
"temporal sequences". In particular, Fig. 4 shows an exemplary input video 
frame, an exemplary motion filter that was specifically designed to detect 

horizontal motion, and three temporal sequences, of which the bolded temporal 

f 

sequence (a single dimension of an n-dimensional observation vector) was derived 
via application of the exemplary motion filter. Fig. 5 shows an exemplary input 
video frame, an exemplary motion filter that was specifically designed to detect 
vertical motion, and three temporal sequences, of which the bolded temporal 
sequence (a single dimension of an n-dimensional observation vector) was derived 
via application of the exemplary motion filter. Fig. 6 shows an exemplary input 
video frame, an exemplary motion filter that was specifically designed to detect 
radial motion, and three temporal sequences, of which the bolded temporal 
sequence (a single dimension of an n-dimensional observation vector) was derived 
via application of the exemplary motion filter. 
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[0009] Fig. 7 shows exemplary correspondences between a number of 
semantic or conceptual events (training samples) and automatically detected 
motion patterns (temporal sequences) from a basketball video. 

[0010] Fig. 8 shows an exemplary procedure for representing sequential 
motion patterns from a video data source with an n-dimensional observation 
vector. A statistical model for sequential pattern analysis is applied to the 
represented motion patterns to map semantics to the represented motion events. 

DETAILED DESCRIPTION 

Overview 

[0011] Conventional semantic video analysis techniques do not adequately 
represent the spatio-temporal complexities of video motion information. As a 
result many video motion events go undetected and unanalyzed. To address this 
problem, systems and methods of the invention convert the video sequence into a 
sequence of energy redistribution. A number of motion filters are generated 
according to the primary motion in the video, wherein each motion filter is 
responsive to a particular type of dominant motion. The motion filters are applied 
the motion energy redistribution sequence of the video. This converts the energy 
measurements into a temporal sequence of filter responses (i.e., sequential signal 
responses) in which distinct temporal motion patterns corresponding to high-level 
concepts are present. In this manner, the spatio-temporal aspects of sequential 
motion are represented. Such a representation can be analyzed by sequential 
processing methods, for semantic motion pattern event recognition. 
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Exemplary Operating Environment 

[0012] Turning to the drawings, wherein like reference numerals refer to 
like elements, the invention is illustrated as being implemented in a suitable 
computing environment. Although not required, the invention is described in the 
general context of computer-executable instructions, such as program modules, 
being executed by a personal computer. Program modules generally include 
routines, programs, objects, components, data structures, etc., that perform 
particular tasks or implement particular abstract data types. 

[0013] Fig. 1 illustrates an example of a suitable computing 
environment 100 on which the subsequently described systems, apparatuses and 
methods for representing sequential motion patterns may be implemented (either 
fully or partially). Exemplary computing environment 100 is only one example of 
a suitable computing environment and is not intended to suggest any limitation as 
to the scope of use or functionality of systems and methods the described herein. 
Neither should computing environment 100 be interpreted as having any 
dependency or requirement relating to any one or combination of components 
illustrated in computing environment 100. 

[0014] The methods and systems described herein are operational with 
numerous other general purpose or special purpose computing system 
environments or configurations. Examples of well known computing systems, 
environments, and/or configurations that may be suitable for use include, but are 
not limited to, personal computers, server computers, multiprocessor systems, 
microprocessor-based systems, network PCs, minicomputers, mainframe 
computers, distributed computing environments that include any of the above 
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systems or devices, and so on. Compact or subset versions of the framework may 
also be implemented in clients of limited resources, such as handheld computers, 
or other computing devices. The invention may also be practiced in distributed 
computing environments where tasks are performed by remote processing devices 
that are linked through a communications network. In a distributed computing 
environment, program modules may be located in both local and remote memory 
storage devices. 

[0015] With reference to Fig. 1, an exemplary system for representing 
sequential motion patterns includes a general purpose computing device in the 
form of a computer 110. Components of computer 110 may include, but are not 
limited to, a processing unit 120, a system memory 130, and a system bus 121 that 
couples various system components including the system memory to the 
processing unit 120. The system bus 121 may be any of several types of bus 
structures including a memory bus or memory controller, a peripheral bus, and a 
local bus using any of a variety of bus architectures. By way of example, and not 
limitation, such architectures include Industry Standard Architecture (ISA) bus, 
Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video 
Electronics Standards Association (VESA) local bus, and Peripheral Component 
Interconnect (PCI) bus also known as Mezzanine bus. 

[0016] Computer 110 typically includes a variety of computer readable 
media. Computer readable media can be any available media that can be accessed 
by computer 110 and includes both volatile and nonvolatile media, removable and 
non-removable media. By way of example, and not limitation, computer readable 
media may comprise computer storage media and communication media. 
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Computer storage media includes volatile and nonvolatile, removable and non- 
removable media implemented in any method or technology for storage of 
information such as computer readable instructions, data structures, program 
modules or other data. Computer storage media includes, but is not limited to, 
RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, 
digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, 
magnetic tape, magnetic disk storage or other magnetic storage devices, or any 
other medium which can be used to store the desired information and which can be 
accessed by computer 110. 

[0017] Communication media typically embodies computer readable 
instructions, data structures, program modules or other data in a modulated data 
signal such as a carrier wave or other transport mechanism and includes any 
information delivery media. The term "modulated data signal" means a signal that 
has one or more of its characteristics set or changed in such a manner as to encode 
information in the signal. By way of example, and not limitation, communication 
media includes wired media such as a wired network or direct-wired connection, 
and wireless media such as acoustic, RF, infrared and other wireless media. 
Combinations of the any of the above should also be included within the scope of 
computer readable media. 

[0018] System memory 130 includes computer storage media in the form of 
volatile and/or nonvolatile memory such as read only memory (ROM) 131 and 
random access memory (RAM) 132. A basic input/output system 133 (BIOS), 
containing the basic routines that help to transfer information between elements 
within computer 110, such as during start-up, is typically stored in ROM 131. 
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RAM 132 typically contains data and/or program modules that are immediately 
accessible to and/or presently being operated on by processing unit 120. By way 
of example, and not limitation, Fig. 1 illustrates operating system 134, application 
programs 135, other program modules 136, and program data 137. 

[0019] The computer 110 may also include other removable/non-removable, 
volatile/nonvolatile computer storage media. By way of example only, Fig. 1 
illustrates a hard disk drive 141 that reads from or writes to non-removable, 
nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to 
a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that 
reads from or writes to a removable, nonvolatile optical disk 156 such as a CD 
ROM or other optical media. Other removable/non-removable, 
volatile/nonvolatile computer storage media that can be used in the exemplary 
operating environment include, but are not limited to, magnetic tape cassettes, 
flash memory cards, digital versatile disks, digital video tape, solid state RAM, 
solid state ROM, and the like. The hard disk drive 141 is typically connected to 
the system bus 121 through a non-removable memory interface such as 
interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically 
connected to the system bus 121 by a removable memory interface, such as 
interface 150. 

[0020] The drives and their associated computer storage media discussed 
above and illustrated in Fig. 1, provide storage of computer readable instructions, 
data structures, program modules and other data for the computer 110. In Fig. 1, 
for example, hard disk drive 141 is illustrated as storing operating system 144, 
application programs 145, other program modules 146, and program data 147. 

Lee & Hayes, PLLC 7 Atty Docket No. MSM601US 

(509) 324-9256 



Note that these components can either be the same as or different from operating 
system 134, application programs 135, other program modules 136, and program 
data 137. Operating system 144, application programs 145, other program 
modules 146, and program data 147 are given different numbers here to illustrate 
that they are at least different copies. 

[0021] A user may enter commands and information into the computer 110 
through input devices such as a keyboard 162 and pointing device 161, commonly 
referred to as a mouse, trackball or touch pad. Other input devices (not shown) 
may include a microphone, joystick, game pad, satellite dish, scanner, or the like. 
These and other input devices are often connected to the processing unit 120 
through a user input interface 160 that is coupled to the system bus 121, but may 
be connected by other interface and bus structures, such as a parallel port, game 
port or a universal serial bus (USB). 

[0022] A monitor 191 or other type of display device is also connected to 
the system bus 121 via an interface, such as a video interface 190. In addition to 
the monitor, computers may also include other peripheral output devices such as 
speakers 197 and printer 196, which may be connected through an output 
peripheral interface 195. A camera 192 (such as a digital/electronic still or video 
camera, or film/photographic scanner) capable of capturing a sequence of 
images 193 may also be included as an input device to the computing device 110. 
Further, while just one camera is depicted, multiple cameras could be included as 
input devices to the computing device 110. The images 193 from the one or more 
cameras 192 are input into the computer 110 via an appropriate camera 
interface 194. This interface 194 is connected to the system bus 121, thereby 
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allowing the images to be routed to and stored in the RAM 132, or one of the other 
data storage devices associated with the computer 110. However, it is noted that 
image data can be input into the computer 110 from peripheral devices different 
than a camera 192, for example via any of the aforementioned computer-readable 
media. 

[0023] The computer 110 may operate in a networked environment using 
logical connections to one or more remote computers, such as a remote 
computer 180. The remote computer 180 may be a personal computer, a server, a 
router, a network PC, a peer device or other common network node, and typically 
includes many or all of the elements described above relative to the computer 110, 
although only a memory storage device 181 has been illustrated in, Fig. 1. The j 
logical connections depicted in Fig. 1 include a local area network (LAN) 171 and 
a wide area network (WAN) 173, but may also include other networks. Such 
networking environments are commonplace in offices, enterprise- wide computer 
networks, intranets and the Internet. 

[0024] When used in a LAN networking environment, the computer 110 is 
connected to the LAN 171 through a network interface or adapter 170. When used 
in a WAN networking environment, the computer 110 typically includes a 
modem 172 or other means for establishing communications over the WAN 173, 
such as the Internet. The modem 172, which may be internal or external, may be 
connected to the system bus 121 via the user input interface 160, or other 
appropriate mechanism. In a networked environment, program modules depicted 
relative to the computer 110, or portions thereof, may be stored in the remote 
memory storage device. By way of example, and not limitation, Fig. 1 illustrates 
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remote application programs 185 as residing on memory device 181. The network 
connections shown are exemplary and other means of establishing a 
communications link between the computers may be used. 

Exemplary Program Modules and Data 

[0025] Fig. 2 shows computer-program module and program data aspects of 
the system of Fig. 1 for representing sequential motion patterns. For purposes of 
discussion, the features of Fig. 2 are described while referring to one or more 
features of Fig. 1. In particular, Fig. 2 shows that application programs 135 
portion of the system memory 130 includes, for example, sequential motion 
pattern representation ("SMPR") module 202. The SMPR module 202 processes 
input video data 204 to represent motion between video frames as energy 
redistribution ("ER") measurements(s) 206. The SMPR module 202 then utilizes 
one or more motion filters 208 to convert the ER measurements 206 into temporal 
sequences 210 representing sequential motion patterns. 

Energy Redistribution between Video Frames 
[0026] Fig. 3 shows an exemplary video set of video frames 300 with a 
motion vector field ("MVF') 302 indicating a redistribution of motion energy 
from one frame to another frame. Usually, each frame is divided into NxM blocks. 
Each block has one motion vector which indicates the displacement of this block 
between the two consecutive frames. All motion vectors in a frame constitute a 
MVF. An MVF is considered an outside force that causes energy exchange 
between video frames. For this reason, each block when using a block-based 
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motion estimation algorithm (e.g., block 306 — delimited with dashed lines) is 
viewed as a basic energy container. The change of energy distribution indicated by 
an MVF is determined to exhibit motion features. In this implementation, all 
blocks in an initial frame are considered to have the same amount of energy. The 
redistribution of energy, as illustrated via the respective energy distribution 
percentages of block 306 show energy distributions of 40%, 25%, 25%, and 10%. 

[0027] Referring to Fig. 2, to generate the video motion ER 
measurements 206, the SMPR module 202 derives MVFs between video frames. 
For purposes of discussion, the MVFs are represented in respective portions of 
"other data" 222. Energy distributions depend only on the position of 
corresponding block in the next video frame. For example, energy at block (x, y) 
is denoted by E x> r An energy distribution update function changes energy 
distribution in each block by enforcing the functions of MVF. When a new MVF 
is considered, the energy in the blocks is calculated by equation (1), below. 
Therefore, the energy values computation in blocks can be viewed as an updating 
process, exemplified with the following equation: 

^(overlaps ij^xEtj) 

<y=- ^ ^ 6 [W] 

w b 

In equation (1), overlap S it jt x> y denotes the overlap portion of the rectangular 
region corresponding to block (i, j) in a previous frame and block (x, y) in current 
frame. W b represents the size of blocks. If blocks move out of frame boundary, 
the blocks are placed just in frame by decreasing the magnitude of the MVF to 
keep the amount of energy within the frame. 
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[0028] In particular, the SMPR module 202 uses a sliding window to 
calculate each respective ER measurement 206 for each frame of the video 204. 
Each frame is added into the sliding window one by one, until the last frame in the 
window reached. This process produces the sequence of energy redistribution 
measurements 206. With respect to configuring the sliding window, the energy 
distribution in the first frame of the sliding window is evenly assigned. For 
example, in one implementation, energy values in all blocks are assigned a value 
of one (1). Accordingly, the "initial frame", as mentioned above, is the first frame, 
whose energy values are fabricated. This provides a reasonable assumption 
without biases. 

[0029] The width w of the sliding window and the sampling frequency v — 
defined by the number of skipped frames when the window slides, are 
configurable parameters that can be changed to achieve desired accuracy of the 
results. That is, the computation complexity and the performance of the procedure 
to represent sequential motion patterns in video data can be configured by 
adjusting these two parameters, both of which are represented in respective 
portions of "other data" 222. 

[0030] For example, the larger the size of the sliding window is, the more 
the computations in each frame involves. Similarly, the high the sampling 
frequency is, the higher the computational complexity of sequential curves 
generation is. However, the final event recognition accuracy may decrease if the 
sliding window size is too small or the sampling frequency is too low. 
Consequently, a trade-off between the accuracy and the computational complexity 
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according to the different application scenarios can be made. This implementation 
of the system is not sensitive to the two parameters. 

[0031] In this implementation, each MVF is determined by known block- 
based motion estimation algorithms (BMA). Though the real motion often cannot 
be obtained by BMA, the loss has been determined to be very small (i.e., trivial), 
as compared to the efficiency of not utilizing BMAs, especially when the input 
video data 204 is in a Moving Pictures Expert Group ("MPEG") data format, or 
some other data format, wherein MVFs are readily available. When the input 
video data 204 is in a data format wherein MVFs are not readily available, the 
MVFs can be determined via any motion estimation algorithms, such as traditional 
block-based motion estimation algorithms. 

Motion Filtering to Identify Motion Types from Frame Energy Redistributions 
[0032] Energy redistribution function (1), as described above, provides a 
way to represent motion between two frames. The SMPR module 202 then 
processes the ER measurements 206 to characterize them as pertaining to one or 
more of particular types of motion in a spatio-temporal data format. To this end, 
the SMPR module 202 applies a number of motion filters 208 to the ER 
measurements 206 to generate temporal motion pattern 210 responses. A motion 
filer 208 is a respective weight matrix with the same dimensions as the number of 
blocks in a frame of video data input 204. By arranging the weights of a motion 
filter's corresponding matrix with different values and/or value ordering, the filter 
designer changes the sensitivity of the motion filter to different motion forms. In 
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other words, a weight matrix can be designed to specifically identify a particular 
type of motion. 

[0033] Elements in a weight matrix (i.e., motion filter 208) are denoted by 
w x> y . Subsequent to calculation of an ER-measurement 206 from respective ones 
of the video frames, as discussed above, the SMPR module 202 applies each such 
motion filter 208 to the ER measurement,, The resulting temporal energy response 
of the frame is defined as follows: 

E R =Z E ij Xw ij (2). 

Over time, responses calculated via equation (2) quantitatively represent 
corresponding motion events. A combination of energy responses produced via a 
particular one motion filter 208 generates a respective sequential motion curve, or 
temporal sequence 210. 

[0034] Figs. 4-6 show a. exemplary set of video frames 204, exemplary 
motion filters 208 that have been applied to calculated motion energy 
redistributions 206 associated with respective ones of the frames, and resulting 
filtering responses, or "temporal sequences" 210. In the figures, the left-most 
digit of a reference number identifies the particular figure in which the referenced 
feature first appears. Thus, a frame 204 represents a portion of the video data 
input 204 of Fig. 2, a motion filter 208 is a motion filter 208 of Fig. 2, and so on. 
The figures illustrate that the temporal sequences for each frame will contain the 
same number of sequential feature curves as the number of motion filters 208 that 
have been applied to the ER measurements 206 associated with the frame. 
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[0035] In this implementation, the number of motion filters 208 applied to 
the ER measurements 206 is three (3). Thus, each graph illustrates 3 sequential 
feature curves. A crest on a curve indicates the presence and direction of a certain 
type of motion — the particular type of motion being a function of the values in the 
respective weighting matrix utilized to generate the curve. Although only 3 
motion filters are utilized in the examples of Figs. 4-6, any number of motion 
filters can be designed and applied to the frames, the number being a function of 
the different types of motion that are to be characterized. For instance, if the input 
video data 204 is a sport video, the specific types of motion corresponding to the 
sport video can be represented by respective ones of the motion filters 208. The 
bold curve in each respective graph has been generated by applying a particular 
type of motion filter 208 to the corresponding video frame. The type and shape of 
the crests identify the direction and characteristics of the identified motions. For 
instance, Figs. 4-6 provide three examples of how type and shape of the crests can 
identify direction and characteristics of the identified motions— i.e. passing a ball 
(horizontal motion), panning up (vertical motion), and zoom-out (radial motion). 

[0036] Referring to Fig. 4, there is shown , an exemplary input video frame 
(one frame of input video data 204), an exemplary motion filter 208 specifically 
designed to detect horizontal motion, and three temporal sequences 410, of which 
the bolded temporal sequence was derived via application of the exemplary 
motion filter 208. Fig. 5 shows an exemplary input video frame (one frame of 
input video data 204), an exemplary motion filter 208 specifically designed to 
detect vertical motion, and three temporal sequences 410, of which the bolded 
temporal sequence was derived via application of the exemplary motion filter 208. 
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Fig. 6 shows an exemplary input video (one frame of input video data 204), an 
exemplary motion filter 208 specifically designed to detect radial motion, and 
three temporal sequences 410, of which the bolded temporal sequence was derived 
via application of the exemplary motion filter 208. Referring to any of Figs. 4-6, 
notice that the number of sequential feature curves (temporal sequences 210, or 
filter responses) is equal to three (3), which matches the number of motion 
filters 208 that have been applied in this implementation to calculated ER 
measurements 206 associated with a sequence of frames — of which the 
corresponding frame 204 is but one exemplary frame. That is, the frames 204 in 
Fig. 4, 5, 6 represent a key frame or representative frame of the three typical 
motions or slips, respectively, which are used to display the content of sample 
clips to users. 

N-Dimensional Observation Vectors 
[0037] For each temporal sequence 210 associated with a respective motion 
filter 208, the SMPR module 202 calculates a respective mean energy value of the 
temporal sequence 210. The mean energy is the average energy within the sliding 
window. Such even energy is considered to be smoother, and thus provide a more 
accurate representation of the individual frame's energy for a statistical method. 
Such mean energy values are represented via respective portions of "other data" of 
Fig. 2. To represent the motion of the frame sequence in the window, the SMPR 
module 202 combines the mean energy values to create an observation vector 216. 
The observation vector is /i-dimensional, wherein n is the number of motion 
filters 208 multiplied by a factor of two (2). For example, if there are 3 original 
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curves generated by motion filters, the corresponding 3 difference curves are 
computed as the difference between the consecutive values. Both the original 
curves and the difference curves work as the input of HMM. The factor represents 
both original input and difference temporal sequences 210 generated by SMPR 
module 202 when determining the ER measurements 206. 

[0038] For instance, in the examples of Figs. 4-6, three (3) motion 
filters 208 are utilized. So, 3 original and 3 difference temporal sequences 210 are 
combined into a six-dimension observation vector. Thus, any one of the bolded 
temporal sequences 210 of Figs 4-6 is representative of the information utilized to 
calculate a single dimension of an n-dimensional observation vector. In this 
example, bolded curves indicate the most responsive curves according to the 
different motion filters, respectively. Figs. 4-6 show the 3 original curves which 
are computed by three motion filters. 

[0039] In this manner, ER measurements 206 are calculated from video 
frame sequences. Motion filters 208 designed to detect specific types of motion 
are applied to the ER measurements 206 to generate temporal sequences 210. The 
temporal sequences 210 represent sequential motion patterns across the video data. 
Mean energy values of the temporal sequences 210 are determined to generate an 
n-dimensional observation vector 216 to represent the sequential motion patterns 
in the video. 

Semantic Characterization of an N-Dimensional Observation Vector 
[0040] Referring to Fig. 2, application programs 135 further includes 
semantic video analysis ("SVA") module 212 to evaluate the observation 
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vector(s) 216 and characterize/map at least a subset of the information contained 
in the vectors as representative of specific semantic events/content 214. In this 
implementation, statistical sequential signal processing (SSSP) module 218 uses a 
single Gaussian Hidden Markov Model (HMM), although other statistical 
sequential signal processing SSSP techniques such as Time-Delayed Neural 
Networks, Dynamic Bayesian Network, and/or the like, could have also been used. 
A HMM is a statistical model for analysis of sequential data. Statistical models 
for sequential data such as the continuous single Gaussian HMM are known and 
often used in speech recognition technology, not in semantic video content 
analysis. To provide such a novel framework, shots extracted from an input video 
sequence 204 are considered as analogous of sentences in speech, and event clips 
are considered as representative of words. Relationships between the energy 
values indicated by the n-dimensional observation vector(s) 216 and actual 
semantic events are thus determined via sentence grammar rules. Thus, the 
complete connected HMMs will have n-states; one-state for each dimension of the 
n-dimension observation vector and begin and exit states. 

Training the HMMs 
[0041] Fig. 7 shows exemplary correspondences between a number of 
semantic or conceptual events (training samples) and automatically detected 
motion patterns (temporal sequences 216) from a basketball video 204. Portion 
702 of Fig. 7 indicates temporal sequences 210 that map to a basketball game 
tracking concept. Portion 704 indicates temporal sequences 210 that maps to a 
lay-up concept. Portions 706 and 708 indicate two respective portions of the 
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temporal sequences 210 that respectively map to different types of "wipe". A 
wipe is a kind of shot transition. Two shots can be connected in numerous ways 
including cut (abrupt transition, without any additional information), dissolve 
(gradual transition), fade-in/fade-out, and wipe. (There usually is an overlay sub- 
image flying through the scene). As shown, such correspondences or 
characterizations are a function of the detected motion events and the subject 
matter represented by the video. Other types of conceptual categories that could 
be defined in an exemplary implementation of the described systems and methods 
for representing sequential video motion patterns in a video of a basketball game 
include, for example: team offence at left or right court, fast break to the left or 
right, lay-up at left/right court, shot at left/right court, tracking player to left/right, 
lay-up in close-up view, shot/foul shot in close-up view,' close-up, wipe, stillness, 
and so on. 

[0042] Representative semantic events, such as the exemplary events listed 
above for a sports event, are each characterized as a "minimal recognition unit" 
(MRU), each event being substantially self-contained so that the event can be 
characterized by the statistical model as a respective grammatical sentence(s). 
Training module 220 is used to train the HMM for each MRU such that shot 
transcriptions are manually prepared based on defined events. In one 
implementation, to avoid over segmentation resulting in short segments that could 
be meaningless for human understanding, we define and apply a number of post- 
processing rules. 

[0043] Post-processing rules are defined according to specific applications. 
For example, in Basketball video analysis, the rules may be listed as follows: 
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1) If the duration is less than 15 frames, marked as an invalid concept. 

2) If the invalid concept is "tracking", the label is changed to "close-up". 

3) If the invalid concept is "fast break", nearby labels will be error in half- 
court for following semantic net. The issue occurs in two cases: 
correction (prior error) and mistake (posterior error). We search up from 
the next to find the first attachable label and search down from the 
previous. Determine the error type by minimal distance and change 
labels to proper half-court. 

4) If the invalid concept is the others, merge it to the next label simply. 

5) Merge same labels adjacent, except "layup". 

Semantic Event Recognition 
[0044] The SVA module 204 recognizes semantic events at thp shot level. 
Event transcription for each shot is provided by the SSSP module 218, which in 
this implementation utilizes HMMs. To this end, a complete connected HMM will 
have n-states; one-state for each dimension of the n-dimension observation 
vector 216, as well as begin and exit states. All events are context dependent. 
Relationships between the energy values indicated by the n-dimensional 
observation vector(s) 216 and actual semantic events are determined via sentence 
grammar rules. For instance, shots extracted from the input video data 204 are 
considered analogous of sentences, and event clips are considered representative 
of words. Shot transcriptions are manually prepared based on defined events and 
the transition probabilities are calculated as follows: 

p(iJ) = N(iJ)/N(i) N(i)*0 (3), 
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wherein N(i) is the total number of occurrences of event i, N(i,j) is the total 
number of co-occurrences of event i and j. 

[0045] The SVA module 212 utilizes a known Viterbi algorithm to segment 
the segments shots by the maximum likelihood estimation; the transition 
probabilities represent a kind of the posterior probability. (Probabilities are 
represented as respective portions of "other data" 222). A product of the two 
probabilities is a final recognition probability. Finally, the concept transcription 
with the maximal recognition probability is regarded as the result — i.e., the 
semantic events/content 214. 

( 

An Exemplary Procedure 

[0046] Fig. 8 shows an exemplary procedure 800 for representing sequential 
motion patterns in a video data source 204 as an n-dimensional observation 
vector 216. A statistical model for sequential pattern analysis is applied to the 
represented motion patterns to map semantics to the represented motion events. 
For purposes of discussion, the operations of the procedure are also described in 
reference to the features of Figs. 1 and 2. For purposes of discussion, the 
operations of blocks 802 through 808 of the procedure are implemented by the 
SMPR module 202, whereas operations of block 810 are implemented by the SVA 
module 212. However, in a different implementation, these operations can be 
collectively or independently implemented by any combination of one or more 
computer-program modules (e.g., the SMPR module and/or the SVA module 212). 

[0047] Referring to Fig. 8, at block 802 the SMPR module 202 (Fig. 2) uses 
a sliding window to apply each motion filter 208 to the input video sequence 204 
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to generate ER measurements 206 according to equation (1). At block 804, the 
SMPR module 202 applies each motion filter 208 to the energy redistribution 
measurements 206. This produce the temporal motion sequences 210 according to 
equation (2). The temporal sequences represent motion events quantitatively. At 
block 806, the SMPR module 202 calculates, for each motion filter applied to the 
ER measurements 206, a respective mean of the energy value sequence generated 
by application of the filter. At block 808, the SMPR module 202 combines the 
mean energy values into an n-dimensional observation vector, wherein n is the 
number of motion filters 208 multiplied by a factor of two (2) to indicate 
consideration of original input and difference temporal sequences 210. At block 
810, the SVA module 212 analyzes the n-dimensional observation vectors(s) via 
the sequential pattern analysis module 218, which in this implementation utilizes 
HMMs, to map the semantic events to the motion patterns represented in the 
observation vector 216. 

Conclusion 

[0048] The described systems and methods represent sequential motion 
patterns. Although the systems and methods have been described in language 
specific to structural features and methodological operations, the subject matter as 
defined in the appended claims are not necessarily limited to the specific features 
or operations described. For instance, represented sequential motion patterns have 
been described as having been derived from a video data input source 204. 
However, the described systems and methods for representing sequential motion 
patterns by calculating energy redistribution (ER) between representative 
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components, and generating an n-dimensional observation vector by applying 
motion specific filters to the ER calculations, can be applied to other spatio- 
temporal data sources such as the object trajectories in 2D+t space and the 
sequence of color histogram difference. Moreover, although HMMs were used to 
map the represented motion patterns to semantic events, other sequential analysis 
tools could also have been used. Accordingly, the specific features and operations 
are disclosed as exemplary forms of implementing the claimed subject matter. 
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